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Abstract 

The Fay-Herriot (FH) model is widely used in small area estimation and uses auxiliary 
information to reduce estimation variance at undersampled locations. We extend the type of 
covariate information used in the FH model to include functional covariates, such as social- 
media search loads, or remote-sensing images (e.g., in crop- yield surveys). The inclusion of 
these functional covariates is facilitated through a two-stage dimension reduction approach 
that includes a Karhunen-Loeve expansion followed by stochastic search variable selection. 
Additionally, the importance of modeling spatial autocorrelation has recently been recognized 
in the FH model; our model utilizes the conditional autoregressive class of spatial models in 
addition to functional covariates. We demonstrate the effectiveness of our approach through 
simulation and through the analysis of American Community Survey data. We use Google 
Trends search curves as functional covariates to analyze changes in rates of household Spanish 
speaking in the eastern half of the United States. 
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1 Introduction 



The Fay-Herriot (FH) model flFay and Herriot 



small area estimation (SAE) (e.g.. 



Jiang et al. 



2011 



19791) is one of the primary tools used in 



Roy, 



20071 . 



You and Zhoul . 



2011 



among 



others). Model-based estimates are widely used in SAE as they represent a way to b orrow 



stren gth across locations and thereby reduce the variance of the small area estimate ( iRao 



20031 ). These models utilize scalar auxiliary information to obtain an "indirect" estimate of 



the small-area variable of interest, rather than a direct survey estimate. Critically, the FH 
model is structured in such a way as to guarantee model-based variance r eduction in the 

5, Chapter 4). 



2003 



variable of interest relative to that of the direct survey estimate flRao . 

As government budgets remain flat or decline, auxiliary information that is relatively 
inexpensive and readily available, but still representative of the population under considera- 
tion, is of substantial interest. Functional covariates based on internet sources, social media, 
or other sources (e.g., remotely sensed image data) may augment or replace scalar auxiliary 
information for a wide variety of surveys. The advantage of these types of covariates is that 
they are often readily available and provide significant information related to a diverse set 
of demographic and other survey outcomes. For instance, Twitter tweets or Google searches 
can be associated with a precise location and searched for specific terms or hashtags. Alter- 
natively, dimension-reduced representations of satellite imagery could be used as auxiliary 
information in modeling outcomes from agricultural surveys. 

Not surprisingly, many federal agencies (including Bureau of Labor Statistics among 
others) have now realized the potential importance of harnessing these massive, readily 
available data sources. Methodologies relying on "web-scraping" for the collection of data 
and use of retail scanner and social-media data have emerged as avenues of particular interest 
(e.g., see the article by Michael W. Horrigan in Amstat News, January 2013, pp. 25-27). 
Consequently, it is extremely important that sound and effective statistical methodology be 
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developed to accommodate this abundantly rich class of "Big Data" resources. 

Functional data analysis (FDA) methodology allows for the use of curves, images, and 
other "objects" as eithe r independen t or dependent variables in a statistical framework (e.g., 



Ramsay and Silverman 



2005 



20061 ). The use of FDA in a (generalized) linear statistical 



modeling framework is well deve 
the last decade. For example, 



oped, with a subst antial amount of research occur ring; over 



Jamesl (j2002f ) and 



Muller and Stadtmuller 



generalized linear models with functional cova riates. In additio n, 



such models in a longitudinal framework and 



Goldsmith et al. 



YaoetaL 



20051) develop 



(120051 ) consider 



( 1201 ll ) develop penalization 



meth ods for regression-model se 



tive, 



Baladandavuthapani et al. 



Crainiceanu et al. 



ection with functional covariates. From a Bayesian perspec- 



(120081 ) work with spatially correlated functional data and 
(120091 ) develop multilevel functional regression models. 
Survey sampling followed by SAE is commonly implemented by official-statistics agencies, 
but in this article we propose a shift from the usual FH model in two ways. We propose a 
spatial FH model that uses functional and/or image covariates as auxiliary information. Ex- 
amples of such covariates include Google Trends curves, Twitter hashtag counts, or remotely 
sensed satellite imagery. The use of social me dia and other internet-based predictors is a 
developing field (see, e.g., 



Signorini et al 



20111 ). However, statistical modeling of such data 



in the context of SAE using functional covariates in spatial FH models remains undeveloped. 

Our model proceeds from a Bayesian perspective and, thus, it allows a natural quan- 
tification of uncertainty through the posterior distributions. Further, we demonstrate the 
importance of accounting for spatial correlation often present in SAE. The Bayesian paradigm 
provides a natural hierarchical framework for incorporating latent spatial random effects. In 
particular, we propose a FH model that utilizes conditional autoregressive (CAR) random 



effects. Final 



([Google, 



y, we use functional covariates that are curves generated from Google Trends 



20121 ) . in a statistical model of state- level American Community Survey (ACS) data 



flhttp : //www . census . gov/acs p . 



The ACS is an on-going survey performed by the United States Census Bureau that 
provides single-year and multiyear estimates for a large number of demographic variables. 
Publicly available data (known as "multiyear estimates") provide one-year estimates for ar- 
eas with large populations, and three- and five-year period estimates for smaller areas, such 
as census tracts. The public-use microdata samples (PUMS) are also available for a diverse 
set of variables and can be used to model smaller geographies, known as public use microdata 
(PUMAs) 



(see http: //www. census .gov/acs/www/data_documentation/public_use_microdata_sample/ 



for comprehensive details). The methodology we present here could also be used to fit sta- 
tistical models to PUMS. 

Typically, one would perform SAE on smaller geographies than states, such as at the 
county or census-tract level. Our reason for analyzing data using each state as a unit is 
that currently the Google Trends data is available at the state level (although one can also 
obtain search data for the ten largest cities in any state). For any particular problem, social 
scientists may find Twitter data and other social-media data at smaller geographies than the 
state level, to which our SAE methodology could be applied. 

The structure of this paper is as follows. We first introduce the motivating data in 
Section |2j We provide the methodological details of our approach in Section [31 and we 
demonstrate the variance-reduction properties of our model via simulation in Section HJ An 
analysis using these techniques in the context of ACS data on changes in household Spanish- 
speaking is given in Section |5j We close with a discussion in Section |6j 



2 Motivating Data: The American Community Survey 

The rate of change of Spanish-speaking persons in the home in different areas of the country 
may provide insight into immigration patterns as well as provide a marker for socio-economic 
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factors. The standard errors of the ACS estimates for language variables tend to be larger 
than most other variables in the survey, and this is even true at larger geo graphie s , such as 



20121 ) as 



the state level. To improve estimates, we incorporate Google Trends data (iGoogld . 
auxiliary information in a framework that uses a spatial FH model with functional covariates. 
Google Trends provide state-level weekly time series indicating scaled search loads in various 
categories (e.g., see Figured]). 

By considering Google Trends searches that contain commonly used Spanish words, we 
are able to develop a proxy measure for household Spanish-speaking. It is reasonable to as- 
sume that individuals who speak Spanish at home are more likely to perform internet searches 
in Spanish. The ubiquitous presence of Google, as well as many social-media services, make 
these searches a readily available source of data. 

When determining which Google Trends data should be used as a proxy for the pattern of 
household Spanish speaking, our approach was to analyze the Google searches of relatively 
common Spanish words. Several candidate words were selected, and we found relatively 
high search volume for the words "y," "el," and "yo," which mean "and," "the," and "I" in 
English, respectively. These words rarely appeared in searches in other languages. We base 
our simulation study (Section @J and application (Section [5]) on these search results. 

Google Trends data present several issues that must be addressed prior to analysis. The 
first issue is related to the way that Google Trends data are defined!^ Although they can be 
scaled and nor malized to a fixed time point by state, the raw data cannot be directly accessed 



((Google, 



2012l ) . This means that the values of the Google Trends data cannot be compared 



between states, and only within state comparisons are valid. To remedy this problem, we fix 



5 The Google Trends data used in this article were downloaded prior to October 2012. Subsequently, 
Google changed the normalization applied to the data and, therefore, the Google Trends data, as presented 
here, are no longer available for download; however, they are available upon request from the corresponding 
author. Nevertheless, all of the results presented in this article are equally applicable to the currently 
available Google Trends data. 
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the time frame of 2008 — 2009 as our period of interest. We standardize each curve to have 
a within-curve mean of zero and a within-curve standard deviation of one. This results in 
curves with the same scale from state to state, which facilitates extraction of curve features, 
rather than spurious differences in magnitude. 

Because we have considered search loads from 2008 — 2009, we need to perform some 
standardization of the outcome. The outcome we consider for each state is defined as 

% households speaking Spanish in 2009 — % households speaking Spanish in 2008 

% households speaking Spanish in 2008 

The western and eastern halves of the country may behave differently with regard to rates- 
of-change of Spanish-speaking; so, for illustration, we restrict our analysis to 20 states and 
the District of Columbia in the eastern half of the United States. This yields 21 locations 
of interest, many of which have traditionally had a low number of native Spanish speakers. 
As a consequence, relatively large changes may appear, but the margins of error (MOE) for 
the ACS estimates of Spanish speaking tend to be larger in the eastern half of the country. 
Considering small areas in the eastern half gives the FH model the potential to provide a 
great deal of improvement when compared to the public- use ACS estimate. 

Iowa, Mississippi, Arkansas, Virginia, West Virginia, Delaware, Rhode Island, Vermont, 
New Hampshire, and Maine are excluded from our analysis. There were two reasons that 
a state was excluded from consideration. The first is that the search load for more than 
20% of the weeks under consideration did not meet the threshold that Google Trends uses 
to indicate search loads. When the threshold was not met, Google Trends reports the value 
to be zero. Removing states with 20% or more zeroes helped to mitigate Google Trends' 
censoring of the data. The second reason a state was eliminated was because after January 
1, 2010, Google Trend s redefined, a nd presumably improved, their algorithm for tagging 



searches to a location ((Google 



20121 ) . Certain states, such as Virginia, exhibited markedly 



different behavior after that date, which casts doubt on the accuracy of the search loads 



during the period 2008-2009 that we considerd. Thus, we excluded these states from our 
analysis. The final count of the small areas is n — 21, and they are listed in Table [TJ 

The approach presented here is certainly not unique to estimating rates of household 
Spanish-speaking. Internet searches or social-media sources contain high- dimensional data 
that, in principle, could be used in many applications of SAE, thus increasing the auxiliary 
information that could be used to improve survey-based estimates. 

3 Functional Covariates in the Fay-Herriot Model 

The model we propose can be viewed as an extension of the traditional FH model. Specifi- 
cally, we propose including functional covariates as a source of auxiliary information, along 
with a random effect that captures spatial correlation. To model the spatial correlation, we 
use a CAR structure. 

For % — 1, . . . , n, the traditional FH model is given by 

Yt = 9 i + e i , (2) 
9i = Po + x'iPx + Ui, (3) 

where e« ~ N(0,af) and itj ~ N(Q, a%), with all error terms, {ej} and {uj}, independent. 
Here, Q{ is the superpopulation mean of the parameter of interest for small area i, Yi is a 
design- unbiased estimate of 9i, and the variance of e*, of, is estimated based on the survey 
design and assumed known. The auxiliary information at small area % is a g-dimensional 
vector of scalar covariates denoted by Xj, with associated parameters (3 X and the intercept 
is given by f3 . 

There is an alternate representation of ([3]): If we let [A\B] represent the conditional 
distribution of a generic random quantity A given the generic random quantity B, then ([3]) 
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can be written as: 



We call the di stribution \Yj \ 9j , of ] the " data model" following the hierarchical modeling 



terminology in 



Cressie and Wikld (120111 ) in order to clarify that the data responses are 



specified conditionally on the superpopulation mean and sampling error. 
3.1 Dimension-Reduced Functional Covariates 

Let Zij (t) denote the j-th functional covariate defined over time domain T and associated with 
the i- th small area. Note that one could also include spatially cor related functional covaria tes 



(e.g.: 



Baladandayuthapani et al 



Holan et al. 



2010 



2012 ) in 



20081 ) or image covariates (e.g. 
this framework. However, for illustration, we focus here on temporal functional covariates. 
An extension of model (El), that includes J functional covariates, can be written as 



3=1 JT 



Ui 



l,...,n, 



(4) 



where {/3j(t) : t G T} is a square-integrable functional parameter associated with the j-th 
functional covariate. Now, for each j, assume that {4>jk{t) : k — 1, . . . , oo} forms a complete 
orthonormal basis in T. Then, we have the unique representation, 



(5) 



k=i 



where {£ij(k) : = 1,2,...} are expansion coefficients of the j-th functional covariate 

associated with the i-th small area. We also have the unique representation, 



(6) 



fe=i 



where {bj(k) : k — 1, 2, . . .} are the expansion coefficients of fij(-), the j-th square-integrable 
functional parameter. From the orthonormality property of the basis functions and upon 
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substitution of (EJ) and fl§D, Q can be alternatively expressed as: 

J oo 

Oi = Q + ^^b j {k% j {k)+^ x + u i . 



(7) 



tiona. 



j=l k=l 

In principle, any complete orthonormal basis set could be used to represent th e func- 
covariates. I n our analysis, we utilize a Karhunen-Loeve (K-L) expansion; see 



( 12002 



Jolliffc 



Chapter 12), 



lowing 



Cressie and Wikle! (12011 



Cressie and Wikle! (12011 



Chapters 4, 5), and the references therein. Fol- 
Chapter 5), assume {%(•)} are stochastic processes with 
E(zij(t)) = 0, and for t,t' G T, define the temporal covariance function for the j-th func- 
tional covariate as Coj(t, t') = E(Zij(t)Zij(t')), which is assumed to be invariant across small 
areas (see Cressie and Wikle, 2011, p. 267, for an analogous definition of a spatial covariance 
function that is invariant in time). Thus, the subscript "0" serves to remind us that this is 
effectively a spatio-temporal covariance function for "lag 0" in space and is invariant over 
all spatial small areas. Then, assuming this covariance is continuous and square-integrable, 



we can write 



c ,j(t,t') = y^ j \jk4>jk(t)ipjk(t'), 



k=l 



where A^i > Xj2 > ■ ■ • are the eigenvalues and {ipjk(-) '■ k = 
eigenfunctions that solve the Fredholm integral equation (e.g., 



2, . . . j are the orthonormal 



Papoulisl . 



1965 



p. 457-461), 



C 0d (t, t')i) jk (t')dt' = \ jk i> jk (t); k = 1, 2, . . . , t e T. 



T 



Because the eigenfunctions, {ipjk{-) '■ k — 1, 2, . . .}, form a complete orthonormal basis, %(t) 
can be written as, 



(9) 



fc=i 



where {£ij(k) : k = 1,2,...} are uncorrelated, mean-zero, variance {\j k : k = 1,2,...} 
random variables, respectively. Thus, one can see that the K-L temporal basis functions 
{ipjkif)} in ([nD play the role of the general temporal basis functions {(j>jkit)} in ©• 



In practice, for T discrete times i 2 , . . . ,tj-}, the empirical temporal basis functions, 
tj)j k = (i/)jk(ti), ...,i/)jk(tT))', are obtained from a numerical solution of flS}. For cases where 



the discrete times are equally spaced, this is equivalent to solving the spectral 



sition of the empirical temporal covariance matrix (e.g. 



Cressie and Wikle 



2011 



decompo- 
, Chapter 



5): C j = SSfjAj&p where *j = \$> jX , . . . , i} jT }, Aj = diag(Aji, . . . , X jT ), and C j = 

(n - l) -1 Er=i( z *i - fij)( z ij ~ PjYi for %j = n ~ l Er=i z ij and z u = ■ ■ ■ > MM)'- 

Note, in some applications, one may consider /x^- = ju.yl, where ju.j is the grand mean, 
ju.j = (nT) -1 E" =1 E^Lx Zij(t). A comprehensive discussion of issues associated with the 
calculation of empirical basis functions in the discrete K-L framework can be found in 

Chapter 5). 



Cressie and Wikle 



(120111 



In practice, the summation in (j7|) must be truncated, such that 

J Ki 

Bi =fa+J2I2 6 ;(%-(*o + x 'a + «i. ( io ) 
i=i fe=i 

where fC,- < T. Then equations (J2]) and (|TU|) together represent a FH model that includes 
both scalar and functional covariates. Typically, Kj is chosen such that some predetermined 
percentage (e.g., 95%) of variation in the function is retained. That is, Kj is the smallest 
integer K <T such that Y^k=i ^jk/ Yl1=i ^jk > 0.95. However, in our framework, this only 
represents an initial phase of dimension-reductio n. Subsequent dimension-reduction proceeds 



by stochastic search variable selection (SSVS) ( iGeorge and McCulloch 



1993 



1993). 



Our Bayesian SSVS requires prior distributions for the components of hj = (bj(l), . . . , bj(Kj)Y , 
j = 1, . . . , J, and of (3 X in (fTUj) . In general, when interest resides in a substantial number of 
submodels, as is the case in the ex amples we con sider, SSVS algorithms provide an effective 



means of model selection (e.g., see 



Georgd . 



2000 



for a comprehensive overview). For exam- 



ple, recall that (3 X = (Pi, fa, ■ ■ ■ , P P Y consists of p (potential) covariates. Consider the prior 
distribution, 



fofri ~ 7^(0, c e rf) + (1 - le )N(0, r 2 ); t=\,...,p, 



IF 



where conditional independence of {/3e} is assumed, and 7^ are specified at the next level of 
the hierarchy to have independent Bernoulli (tti) distributions, with parameter < tt^ < 1, for 
£ — 1, . . . ,p. In this context, ir£ represents the prior probability that should be included in 
the model, and 7^ = 1 indicates that the £-th variable (£ = 1, . . . ,p) i s included in the model- 



Now, typica lly, q, T£, and m are taken as fixed hyperparameters; 



(11993 



George and McCulloch 



19971 ) present several alternatives for their specification. Specifically, they recommend 
taking Te to be small so that when 7^ = it is sensible to specify an effective prior for 
that is close to zero. Additionally, in general, it is advantageous to take q to be large 
(greater than 1) so that if 7^ = 1, then the prior favors a non-zero 0£. For j = 1, . . . , J, 
selection of the elements of b, proceeds in an identical manner to selection of the elements of 
f3 x , with prior mutual independence between {hj} and /3 X assum ed. For further discussion 



surrounding SSVS as it relates to functional data modeling, see 
and the references therein. 



Holan et al 



(12010 



20JJ) 



3.2 Spatial Random Effects 

In extensions of the basic FH model, the vast majority of papers in the literature assume 
independent Gaussian latent random effects for u = (ui,...,u n )'. Instead, the model 
we propose assumes spatially correlated random effects based on the CAR model. Other 
spatial models, such as SA R models and geostatistical models have been used (see, e.g., 



Sengupta and Cressie 



20131 . for a rev iew and compariso n of these ). CAR modeling fo r es 



timation in small areas dates back to 



Besag et al. 



(I199ll ); see also 



Leroux et al. 



(11999h and 



MacNabl (120031 ). who utilize such a model to estimate rates for non-rare di seases in smal. 



areas. The CAR model has also been employ ed in the FH structure (e .g. 



Cressie 



Gomez-Rubio et al. 



2010 



You and Zhou 



201 ll ). In addition 



19901 . 



Torabil (120111 ) has implemented 



the intrinsic CAR (ICAR) model to account for the spatial effects in a spatio-temporal hier- 
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archical Bayesian FH model. We utilize the same ICAR structure here, now in the presence 
of functional covariates. 

The use of an ICAR prior allows the latent spatial characteristics of the data to be 
modeled directly, whi ch facilitates the bo rrowing; of strength across spatial units. The ICAR 
formulation is due to 



Besag et al. 



( Il99ll ). In this setting, define 



u 



w i+ w i+ 



(12) 



where the notation "i ~ f denotes that small areas i and j are neighbors (e.g., they share 
a border), and the term u>j + indicates the number of neighbors associated with small area 
%. The IC AR model defined b y (Tl2|) yields an Intrinsic Gaussian Markov Random Field 



(IGMRF) flRue and Held 



20051 ). which corresponds to an improper prior distribution in the 



hierarchical model we propose. The precision matrix of this IGMRF has the form 

where is a diagonal matrix with element (i,i) equal to Wi + , the number of neighbors 
of small area i. Further, the (i,j)-ih element of W equals one if small areas i and j are 
neighbors, and zero otherwise. The diagonal of W is set to zero since small area % is not a 
neighbor of itself. 

The improper prior is due to a linear dependency in the columns of (D^ — W), which 
can be seen by post-multiplying this matrix by a vector of ones and noting that it yields 
a vector of zeroes. Despite its impropriety, the ICAR prior distribution is often used, as 
it yields a proper posterior distribution for many commonly used data models, such as the 
Gaussian, Poisson, and Binomial distributions. The ICAR prior implies a smoother spatial 
process than can be obtained from a CAR prior, which facilitates more borrowing of strength 
between spatial units. A "sum-to-zero" constraint, XX=i Ui = 0> ^ s nee ded allow the intercept 
term in the model to be estimable; if not enforced, the intercept and spatial latent effects 
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ar e linearly depe n dent. Fast algorithms for sampling u subject to Y^i=i u i = ® can be found 



m 



Rue and Heidi (120051 ). which we will need in our analysis.. 



In conjunction with a Gaussian data model, the ICAR prior yields a proper Gaussian 
posterior distribution for {uj : i = l,...,n}. This makes the ICAR (and CAR models 
in general) convenient for modeling the spatial dependency in the FH framework, where 
Gaussian data models are typically assumed. In a hierarchical modeling framework, of which 
the FH model is a special case, the posterior distribution is often simulated using a Gibbs 
sampler, made up of a sequence of Gibbs steps. When an ICAR or CAR prior is used with 
a non-Gaussian data model, some of the Gibbs steps are more likely to involve Metropolis 
sampling or numerical approximations. 



4 Simulation Study 

The simulation study we consider is designed to evaluate the performance of our model (j2J), 
fTTU]) . ffTTj) . and (TT2"j) using simulated data that is calibrated to behave like our motivating 
ACS example. In particular, we consider the effect of using functional covariate information 
and spatial correlation, both within the FH framework. We assess performance in terms of 
reduction of variance of the small area estimates. 

Using the expansion coefficients from (flOl) . based on the detrended time series (see Step 
2, Appendix A), we generated 250 data sets according to the algorithm found in Appendix 
A. For each data set we performed a FH-CAR-SSVS analysis and our MCMC algorithm 
consisted of 100,000 iterations with the first 2,000 discarded for burn-in. As mentioned, 
the FH model yields a guaranteed decrease in estimation variance, and the long chains are 
required to achieve sufficiently low Monte Carlo error that these variance improvements can 
be verified. In this setting, all of the full conditional distributions are of standard forms and 
straightforward to derive. As such, Gibbs sampling was used for all model parameters. The 
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full conditional distributions of the parameters can be found in Appendix B. The model used 
for simulating the data {Yi} is 

% = 6i + ei 

K 

Oi = A) + X} 6 (*0£(*) + «i> 
fc=i 

where £i(k) is derived from %{£) — z, with %{t) being time series simulated to behave like 
the Google Trends curves for the search term "el" according to the algorithm in Appendix A 
(i.e., see Step 5 of Appendix A). That is, the data are simulated using just one curve for each 
small area. Finally, for this simulation, {u{\ where assumed to follow the ICAR structure 
specified in f[T2~j) with parameters detailed in Step 8 of Appendix A. 

The estimated model for each of the 250 data sets consisted of (j2J), with the following 
model for 9i 

13 

^ = A) + X^&W + n - 

In this case, {ui : % = l,...,n} follows the ICAR model given in (fl2"]) . with a\ ~ 
7G(.001, .001) and a "sum-to-zero" constraint imposed on the elements of u. Finally, we 
assumed O ~ N(0,af), with aj ~ 7(3(0.001,0.001). 

Our primary interest is the reduction in the variance associated the estimate of the survey 
quantity of interest. To evaluate the variance reduction, we compute 

°1 ~ x 100%, (13) 

°i 

for % = 1, . . . , n. For each of the 250 simulated data sets, three analyses were performed. The 
first analysis was performed using the Spatial FH model with functional covariates described 
in ( TlQl) (henceforth called the "SFFH" model). The second model was a FH model with 
functional covariates and independent Gaussian spatial effects, as is typically done in the 
FH framework (henceforth called the "FFH" model). The third model includes an ICAR 
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prior on the latent spatial effects, but ignores the functional predictors and contains no 
auxiliary information (henceforth called the "Spatial Only" model). In general, one would 
also include a nonspatial nonfunctional FH model that utilizes only scalar covariates, but 
we choose not to since our simulated data does not include scalar covariates. Estimates of 
the mean of the variance reductions can be found in Table [TJ 

As illustrated in Table (TJ the spatial only model does not perform as well as the FH 
models containing auxiliary information (i.e., the FFH and SFFH models). This is not 
surprising, because the auxiliary information is key to the reduction of variance in the FH 
model. However, note that spatial autocorrelation is important in this simulation study. 
There was greater variance reduction in 18 of 21 locations when using the SFFH model 
with functional covariates as compared to the FFH model. Two locations that did not 
show improvement were the District of Columbia and Maryland. The District of Columbia 
only borders Maryland (recall we removed Virginia from consideration), and Maryland has 
only one other neighbor (Pennsylvania). The relationship between Washington D.C. and 
Maryland is the likely explanation for these two locations performing worse in the case 
where a spatial structure is included in the model. Minnesota is the third location, and 
Minnesota has only a single neighbor, which leads to poor spatial fitting (e.g., large mean 
squared error of the latent spatial effect). We conclude that, while the spatial structure 
does not guarantee a reduction in variance at every location in the FH framework, an ICAR 
model is an effective way to achieve a reduction in the average estimation variance when 
spatial autocorrelation is present. 

Additionally, our SSVS algorithm selects several eigenfunctions of the search term "el" 
to be of interest. The primary eigenfunctions of interest are of higher order: the sixth, 
ninth, and eleventh principal components are selected in 61%, 62%, and 65%, respectively, 
of the 250 models fitted. Additionally, the fourth, tenth, twelfth, and thirteenth principal 
components are selected in 55% to 60% of the 250 models fitted. This indicates that high- 
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frequency components in the models are important. The first three principle components 
are selected in fewer than 50% of the 250 models fitted, indicating that these low-frequency 
components are selected less often relative to the high-frequency components and less often 
than the prior would suggest. This provides further evidence towards the importance of 
functional covariates and in particular their high-frequency components. 

5 Google Trends Data to Improve ACS Estimates 

Recall that we utilize a prior distribution for SSVS that consists of a mixture of normals to 
distill the important features of the functional covariates of the searches for "y," "el," and 
"yo" (Section 13. ip . When employing the SSVS procedure, it is typically advantageous to 
ensure that all of the covariates are on the same scale. Otherwise, certain components may 
be selected based solely on their relative magnitude. Therefore, in addition to the standard- 
ization discussed in Section [2j in our model, all 39 principle components under consideration 
(i.e., the eigenvectors : k — 1, . . . , Kj, j = 1, . . . , 3} in Section l3TTj) were standardized by 
subtracting the mean of the component and dividing by the standard deviation of the com- 
ponent. This yielded functional principle components with a mean of zero and a standard 
deviation of one. 

The model we use here differs from the simulation study in that we utilize all three search 
terms "y," "el," and "yo" as our functional information (see Figured]). The model used then 
becomes 

Yi = 6i + ei 

3 Kj 
j=l k=l 

For our purposes, ire m (jlip . the SSVS portion of the model, was fixed at 0.5 for all £, 
as this yields equal contributions to the likelihood whether a variable is included or not, 
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and it can be considered non-informative in this sense. The terms q and Ti were considered 
equal for all components, yielding two hyperparameters, c and r, which were chosen via a 
sensitivity analysis. Specifically, we allowed r 2 to take values 10 -3 , 10 _4 ,and 10 -5 , and c to 
take values 10 and 100. A factorial (sensitivity) experiment was performed in order to select 
the values of c and r for our analysis. In this experiment, we chose the values of c and r that 
yielded the lowest mean posterior variance in a leave-one-out cross-validation scheme. For 
each combination of c and r, one small area at a time was removed for 40,000 iterations, and 
the chain allowed to burn in for 2,000 iterations. The remaining 38,000 iterations for this 
area was then used for inference. The MCMC algorithm was run separately for each small 
area left out. The leave-one-out cross-validation scheme is designed to protect against model 
overfitting. Our factorial design selected r 2 = 10~ 5 and c = 10 as producing the lowest mean 
posterior variance when averaged over all {0i}. 

For c = 10 and r 2 = 10~ 5 , the MCMC algorithm was run for 100,000 iterations with 
the first 2,000 being discarded for burn in. The functional principle components selected 
by the SSVS algorithm in over 50 percent of models were the ninth principal component 
of "y" (77% of models), the first principal component of "y" (55%), the tenth principal 
component of "y" (55%), the seventh principal component of "el" (70%), the fifth principal 
component of "el" (57%), and the tenth principal component of "yo" (59%). All other 
principal components were selected with frequency smaller than the prior 7r=0.5 (ranging 
from 36% to 48%), suggesting that the data selects against these other components. It is 
worth noting that four of the six components selected in over fifty percent of the models are 
high-frequency terms (i.e., the fifth principle component or higher). It would appear that 
high-frequency features of the Google Trends data are the primary predictors of the rates 
of change of household Spanish-speaking. These components may be detecting shocks in 
the search load, indicating large search volumes resulting from instances when some of the 
Spanish speaking community within a state searches for news or other stories of interest. 
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The variance reductions provided by this model can be found in Table [5J and the im- 
provements are illustrated in Figure |2j Additionally, we report the results of an analysis 
using the SSVS with independent random spatial effects and the time series of "y," "el," 
and "yo" (labeled as the "FFH model") as well as a model that only utilizes an intercept 
and spatial random effects with an ICAR correlation structure (called the "Spatial Only" 
model). Of note is the advantage of using the spatial structure in the FH model. The SFFH 
outperforms both the FFH model and the Spatial Only model in 15 of 21 small areas, while 
it performs worse than the other two models in only 3 of 21 small areas. These three small 
areas are the District of Columbia, Maryland, and Connecticut. The issue of poorer estima- 
tion in the District of Columbia and Maryland was previously discussed in the simulation 
study. Based on our simulation study, the poorer estimation of Connecticut is likely due 
to the particular data set rather than a systematic issue. This conclusion is derived from 
the fact that in our simulation, for particular data realizations, states other than DC and 
Minnesota were estimated more poorly using the SFFH model than the other two models 
(FFH and Spatial Only). In short, when using a SFFH model, these states are estimated 
better on average across all 250 realizations, though this is not guaranteed for any particular 
realization. These results argue strongly for the use of spatially correlated latent random 
effects and further demonstrate the utility of functional covariates in the FH framework. 

6 Discussion 

FH models have a celebrated history, owing to their versatility in small area estimation. To 
increase the usefulness of this class of models, we have extended them to include functional 
covariates along with spatial correlation. Importantly, we have demonstrated that functional 
covariates can be effectively utilized to improve estimation in the public-use ACS data. 
Further, we have emphasized the importance of the spatial relationships between small areas 
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in our model, and we have illustrated the importance of a spatial prior in the FH structure. 

The fully Bayesian procedure incorporating the dimension-reducing SSVS provides an 
automated method for feature selection and selection among different candidate models. The 
model selection is tuned to minimize the variances of {9f i — 1, . . . ,n}. However, it would 
also be possible to consider other model properties, when selecting SSVS hyperparameters. 

The issue of spatial autocorrelation has been addressed systematically, and we have 
illustrated, via model-based simulation and through our motivating ACS data, that priors 
inducing spatial autocorrelation yield greater reduction in estimation variance than non- 
spatial priors. We have also found that the reduction in variance is not guaranteed to be 
greater for every location. Even so, these results argue strongly for spatial priors to be used 
in the FH framework. 

Due to data limitations, we have applied our approach using Google Trends data that 
are available at the state level, but not for smaller areas. Twitter data are another source 
of functional covariates, and they are available at finer spatial resolutions. However, the 
drawback of using Twitter data is that they are not as readily available. Finally, our model 
is also generally applicable with image data, such as remotely sensed scenes of land- use /land- 
cover, which may result in a key use of this technique in agricultural surveys. 
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Appendix A: The Simulation Algorithm 

The following algorithm was used to generate the functional covariates and the data for the 
simulation study presented in Section HI 

Step 1: Consider the Google Trends time series for the search term "el" at location i. 
Denote this quantity by Zj = (z{(ti), z^tx))'- Let Z = [zi, z n ] be a T x n matrix 
containing the Google trends time series associated with the search term "el." 

Step 2: Subtract the time-dependent mean of the matrix Z, namely z = ^ _1 (^" =1 Zj), from 
each column of Z to obtain Z*, a matrix of detrended time series. 

Step 3: Consider the T x T empirical covariance matrix S* = Z*Z* /(n - 1). Let S* = 
<fr*A*<fr* be the usual spectral decomposition of S*. Here, 3>* represents the discretized 
eigenf unctions for the functional covariate "el." 

Step 4: Project the detrended time series onto these eigenfunctions: A = Z*. Let the 
scale of the eigenfunctions be denoted by r = diag(AA'/r2). 

Step 5: Generate a new set of functional curves, z \ = z + where ~ MVN (0, r), 

and define Z = [zi, . . . , z n ]. 

Step 6: Next we simulate a set of responses. First, obtain the weighted least squares 
estimates b* from Y = <fr*b* + e, where b* denotes the vector of coefficients as- 
sociated with the Google Trends functional time series for the search term "el," 
e ~ MVN (0, diag(of , . . . , cr^)), and Y denotes the n-dimensional vector of observed 
small-area responses from the ACS, namely ([T|). 

Step 7: For the new functional covariate Z, perform Steps 1 - 3 in order to obtain simulated 
discretized eigenfunctions, $ , for the simulated functional covariate. 
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Step 8: Generate simulated responses Y according to Y = $ b* + u + e, where u ~ 
ICAR(al), al ~ JG(21/2, 0.004), and e ~ MVN (0, diag(a 2 , . . . , a 2 )). The generating 
distribution for a, 2 was chosen to yield variances similar to those estimated from the 
ACS data analyzed in Section 6. 

Appendix B: Full Conditional Distributions 

Here we provide the forms of the full conditional distributions for the SFFH model utilized 
in Section [5j We define T as a diagonal matrix with T« = cr 2 7^ + r 2 (l — 7^) and E e to 
be diagonal matrix with S eii = of. The term S = [£i(l), . . . , £j(Kj)] denotes a matrix 
with columns £j(k) = (£ij(k), . . . ,Cnj(k))'. The scalar n represents the number of locations 
under consideration. For our analysis, the value is 21 and we let b = (b' 1; . . . , bj)' denote 
the concatenated vector of {bj}. The scalar K = ^ . Kj represents the dimension of b and, 
for our analysis, this value equals 39. The scalars a\ and 02 denote the shape and scale 
parameters in the IG(ai,ci2) prior for cr 2 and cr 2 . For our analysis we set a x = a 2 = 0.001. 
Under this notation, the full conditional distributions have the following forms. 

1. b ~ MVN(fj, b , E 6 ), where E b = (S'E^S + T" 1 )^ 1 and fi b = EfcS'E^y - l/3 - u). 

2. u ~ MVN(fj, u ,n u )I {j:Uui=0} , where ft u = (E7 1 + v?{D w - W})~\ Mu = 
f2 u E7 1 (y — l/3o — 3b), and /{.} denotes the indicator function. 

3. For j = 1, . . . , J and £ = 1, . . . , pj, 

,, ~ Bern ( IMlil Z D ^ 

7j< l/feta = i) + /fete = o); ' 

where /(•) is the pdf of the normal prior associated with bjg, and Bern(p) denotes 
a Bernoulli distribution with probability p. For model identifiability, we require the 
number of selected {bji} be less than the number of locations (i.e., Yliiljt — 20). 
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In cases where the SSVS prior selected this number greater than 20, the set {jjt} 
was re-sampled. Note, this occurred infrequently (i.e., in less than 10 percent of the 
samples) . 

4. al ~ IG{a x + n/2, a 2 + u'(D ffl - W)u/2). 



5. O ~ N(^ ,aj o ), where a% = {I'^l + a% )~ l and H , = a^l'^y - Hb - u). 

6. aj o ~/G(a 1 + l/2,a 2 + /3 2 /2). 

Finally, the inclusion of scalar covariates is straightforward. That is, sampling (3 X in (jlj) using 
an SSVS prior wou ld proceed in a similar manner to sampling the functional covariates (see 



Holan et al 



20121 . for an example) 
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State 


SFFH 


FFH 


Spatial Only 


A 1 1 

Alabama 


52.8 


44.0 


14.1 


Connecticut 


14.8 


13.7 


3.3 


District of Columbia 


65.7 


75.1 


38.9 


Florida 


1.4 


1.4 


0.3 


Georgia 


11.8 


7.6 


1.8 


Illinois 


5.3 


3.2 


0.7 


Indiana 


31.9 


23.4 


6.3 


Kentucky 


64.5 


51.2 


18.6 


Maryland 


19.7 


21.1 


6.0 


Massachusetts 


12.3 


11.3 


3.0 


Michigan 


20.6 


13.9 


4.0 


Minnesota 


2.41 


29.6 


9.4 


Missouri 


39.0 


31.2 


10.2 


New Jersey 


O.o 


K 9 


1 9 
l.Z 


New York 


2.8 


1.9 


0.4 


North Carolina 


11.2 


8.6 


1.9 


Ohio 


33.1 


23.9 


6.6 


Pennsylvania 


21.6 


14.9 


3.7 


South Carolina 


34.2 


30.9 


8.9 


Tennessee 


45.4 


32.6 


9.8 


Wisconsin 


28.1 


22.7 


6.1 



Table 1: Mean relative percentage decrease in variance estimates for the 21 small areas based 
on 250 simulated data sets for the spatial FH model with functional covariates (SFFH), the 
standard FH model with functional covariates (FFH), and a FH model using only spatial ran- 
dom effects (Spatial Only). Bolded values indicate the greatest variance reduction. Variance 
reduction is computed by Equation ( 1131) 
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State 


SFFH 


FFH 


Spatial Only 


a 2 


Alabama 


34.3 


30.8 


26.7 


2.014e-3 


Connecticut 


1.0 


9.3 


9.6 


3.472e-4 


District of Columbia 


60.8 


67.2 


68.5 


7.268e-3 


Florida 


1.8 


2.0 


0.1 


3.000c-5 


Georgia 


11.8 


7.2 


5.9 


1.819e-4 


Illinois 


4.6 


3.1 


2.5 


7.204e-5 


Indiana 


24.5 


19.4 


18.6 


6.883e-4 


Kentucky 


44.7 


39.6 


43.7 


2.487e-3 


Maryland 


17.0 


18.5 


17.0 


6.732e-4 


Massachusetts 


9.7 


9.8 


9.0 


3.159e-4 


Michigan 


18.0 


10.8 


0.2 


5.239e-4 


Minnesota 


19.4 


22.2 


26.2 


1.086e-3 


Missouri 


32.2 


27.9 


24.5 


1.224e-3 


New Jersey 


5.8 


4.9 


3.6 


1.184e-4 


New York 


2.9 


1.7 


1.4 


4.159e-5 


North Carolina 


9.6 


8.2 


5.3 


2.030e-4 


Ohio 


31.2 


23.9 


19.1 


7.320e-4 


Pennsylvania 


19.6 


15.6 


11.1 


3.901e-4 


South Carolina 


24.7 


21.4 


21.9 


1.093e-3 


Tennessee 


38.9 


29.8 


24.3 


1.160c-3 


Wisconsin 


19.1 


17.0 


18.0 


6.638e-4 



Table 2: Relative percentage decrease in variance estimates for the 21 small areas for the 
analysis of the ACS data for the spatial FH model with functional covariates (SFFH), the 
standard FH model with functional covariates (FFH), and a FH model using only spatial 
random effects (Spatial Only). Bolded values indicate the greatest variance reduction. Vari- 
ance reduction is computed by Equation (fT3l) . The column with the heading a 2 gives the 
known sampling variance of the relative change in percent household Spanish-speaking for 
each state. 
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Figure 1: Functional curves for the Google Trends search loads of "el," "yo," and "y" (see 
Section [2]). To avoid clutter, we show only the first five time series, in alphabetical order 
(i.e., Alabama, Connecticut, District of Columbia, Florida, and Georgia), for each search 
term. 
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Figure 2: Relative percentage decreases in functional Fay-Herriot model variance versus 
ACS variance for the SFFH (upper left), the FFH model (upper right), and the Spatial Only 
model (lower left). 
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